library(tibble)
library(knitr)
library(dplyr)
library(janitor)
library(tidyverse)
library(ggplot2)
library(DT)
library(gganimate)
library(patchwork)
library(gifski)DATA PROGRAMMING
PROJECT
Hello, I am Yusuf Emre Kilicer. Welcome to my project on Obesity. This project involves a comprehensive analysis of a dataset related to obesity, focusing on various factors such as eating habits, physical activity levels, and family history. The goal is to uncover patterns and insights that can contribute to our understanding of obesity and its associated factors.
In this project, I undertook the following steps:
Data Cleaning and Preprocessing
Exploratory Data Analysis
Data Manipulation
Visualization
Animated Visualizations
Tables
Throughout the project, you will find detailed explanations of each step, along with the corresponding R code and outputs. These elements together aim to provide a thorough and engaging analysis of obesity-related data.
I hope you find the project informative and engaging.
KIND REGARDS
EMRE
Pozdravljeni, sem Yusuf Emre Kilicer. Dobrodošli v mojem projektu o debelosti. Ta projekt vključuje celovito analizo podatkov, povezanih z debelostjo, s poudarkom na različnih dejavnikih, kot so prehranjevalne navade, stopnje telesne aktivnosti in družinska anamneza. Cilj je odkriti vzorce in vpoglede, ki lahko prispevajo k našemu razumevanju debelosti in z njo povezanih dejavnikov.
V tem projektu sem izvedel naslednje korake:
Čiščenje in predobdelava podatkov
Raziskovalna analiza podatkov
Manipulacija s podatki
Vizualizacija
Animirane vizualizacije
Tabele
V celotnem projektu boste našli podrobne razlage vsakega koraka, skupaj z ustrezno kodo v jeziku R in izhodi. Ti elementi skupaj si prizadevajo zagotoviti temeljito in zanimivo analizo podatkov, povezanih z debelostjo.
Upam, da boste projekt našli poučen in zanimiv.
PRIJAZNO SPOŠTOVANJE
EMRE
🤫 ADMISSION (click to expand) 🙈 !
I used google translate for greeting Slovenian.
Thus , I don’t know if it was translated correctly…
WELCOME TO MY PROJECT
TIME TO SHOW MY MASTERY OF R LANGUAGE
IMPORTING LIBRARIES
Below are the libraries necessary for this project:
IMPORTING DATASET
Let’s start with loading the data first.
my_data <-read.csv("/Users/y.emrekilicer/Desktop/DATA KONU PROJE ODEV/DATA PROJE/ObesityDataSet_raw_and_data_sinthetic.csv")NOW LET’S SEE WHAT WE GET
NOTE THAT EVERY NECESSARY INFORMATION ABOUT COLUMNS, UNITS ETC WILL BE PRINTED LATER.THIS IS JUST FOR GENERAL CHECKING IF DATA IMPORTED CORRECTLY.
my_data %>%
datatable(
caption="THE GENERAL OVERVIEW OF IMPORTED DATA",
options = list(
pageLength = 4,
autoWitdh = TRUE
))DATA CLEANING
Before starting the data manipulation, it is crucial to ensure no duplicated rows or missing values exist in the dataset.
# Clean and standardize column names to lowercase and remove special characters for consistency
my_data <- janitor::clean_names(my_data)
# Identify and display duplicated rows if any exist
duplicated_df <- my_data %>%
group_by_all() %>%
filter(n() > 1) %>%
ungroup()
duplicated_df %>%
head(4) %>%
gt::gt()| age | gender | height | weight | calc | favc | fcvc | ncp | scc | smoke | ch2o | family_history_with_overweight | faf | tue | caec | mtrans | n_obeyesdad |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21 | Female | 1.52 | 42 | Sometimes | no | 3 | 1 | no | no | 1 | no | 0 | 0 | Frequently | Public_Transportation | Insufficient_Weight |
| 21 | Female | 1.52 | 42 | Sometimes | no | 3 | 1 | no | no | 1 | no | 0 | 0 | Frequently | Public_Transportation | Insufficient_Weight |
| 25 | Female | 1.57 | 55 | Sometimes | yes | 2 | 1 | no | no | 2 | no | 2 | 0 | Sometimes | Public_Transportation | Normal_Weight |
| 25 | Female | 1.57 | 55 | Sometimes | yes | 2 | 1 | no | no | 2 | no | 2 | 0 | Sometimes | Public_Transportation | Normal_Weight |
# Remove duplicated rows using the distinct function
prev_num_rows <- nrow(my_data)
my_data <- distinct(my_data)
new_num_rows <- nrow(my_data)
message <- paste("The number of rows has been updated to ", new_num_rows, ". Previously, it was ", prev_num_rows, ".", sep="")
cat(message, "\n")The number of rows has been updated to 2087. Previously, it was 2111.
# Check for missing values in the dataset
if (any(is.na(my_data))) {
cat("Warning: The dataset contains missing values. Please handle them with caution.\n")
} else {
cat("There are no missing values in the dataset.\n")
}There are no missing values in the dataset.
GENERAL OVERVIEW
#| label: classnames
my_data <- clean_names(my_data)
# Retrieve and print the column names and their data types.
column_information <- my_data %>%
sapply(class) %>%
as.data.frame() %>%
rownames_to_column(var = "Column Name") %>%
rename(Class = 2)
# Display the columns and their classes in a table.
column_information %>%
datatable(caption = "Column Names and Their Classes",options= list( pageLength=8,
autoWidth =TRUE
))NOTE!!: TO SEE A DETAILED INFORMATION ABOUT THE COLUMN , CLICK HERE
cat(paste("The number of columns is ", ncol(my_data)))The number of columns is 17
cat(paste("The number of rows is ", nrow(my_data)))The number of rows is 2087
# Display the first 10 rows of our data for a preview.
my_data %>%
head(10) %>%
kable(format = "html", caption = "The Obesity DataSet Raw and Synthetic Data")| age | gender | height | weight | calc | favc | fcvc | ncp | scc | smoke | ch2o | family_history_with_overweight | faf | tue | caec | mtrans | n_obeyesdad |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21 | Female | 1.62 | 64.0 | no | no | 2 | 3 | no | no | 2 | yes | 0 | 1 | Sometimes | Public_Transportation | Normal_Weight |
| 21 | Female | 1.52 | 56.0 | Sometimes | no | 3 | 3 | yes | yes | 3 | yes | 3 | 0 | Sometimes | Public_Transportation | Normal_Weight |
| 23 | Male | 1.80 | 77.0 | Frequently | no | 2 | 3 | no | no | 2 | yes | 2 | 1 | Sometimes | Public_Transportation | Normal_Weight |
| 27 | Male | 1.80 | 87.0 | Frequently | no | 3 | 3 | no | no | 2 | no | 2 | 0 | Sometimes | Walking | Overweight_Level_I |
| 22 | Male | 1.78 | 89.8 | Sometimes | no | 2 | 1 | no | no | 2 | no | 0 | 0 | Sometimes | Public_Transportation | Overweight_Level_II |
| 29 | Male | 1.62 | 53.0 | Sometimes | yes | 2 | 3 | no | no | 2 | no | 0 | 0 | Sometimes | Automobile | Normal_Weight |
| 23 | Female | 1.50 | 55.0 | Sometimes | yes | 3 | 3 | no | no | 2 | yes | 1 | 0 | Sometimes | Motorbike | Normal_Weight |
| 22 | Male | 1.64 | 53.0 | Sometimes | no | 2 | 3 | no | no | 2 | no | 3 | 0 | Sometimes | Public_Transportation | Normal_Weight |
| 24 | Male | 1.78 | 64.0 | Frequently | yes | 3 | 3 | no | no | 2 | yes | 1 | 1 | Sometimes | Public_Transportation | Normal_Weight |
| 22 | Male | 1.72 | 68.0 | no | yes | 2 | 3 | no | no | 2 | yes | 1 | 1 | Sometimes | Public_Transportation | Normal_Weight |
DEEPING INTO DATASET
# Calculate descriptive statistics only for numeric columns in the dataframe my_data
additional_manipulate <- my_data %>%
# Apply summary functions across all numeric columns
summarise(across(
where(is.numeric), # Apply the functions only to numeric columns
list(
count = ~length(.), # Count total values
mean = ~mean(.), # Calculate mean
std = ~sd(.), # Calculate standard deviation
min = ~min(.), # Find minimum value
`50%` = ~median(.), # Median
max = ~max(.) # Find maximum value
)
)) %>%
# Keep data in a long format for clarity
pivot_longer(
cols = everything(),
names_to = c("Column", ".value"),
names_pattern = "(.*)_(.*)"
)
# Output the descriptive statistics table
additional_manipulate %>%
gt::gt()| Column | count | mean | std | min | 50% | max |
|---|---|---|---|---|---|---|
| age | 2087 | 24.3530898 | 6.36880144 | 14.00 | 22.847618 | 61.00 |
| height | 2087 | 1.7026741 | 0.09318594 | 1.45 | 1.701584 | 1.98 |
| weight | 2087 | 86.8587296 | 26.19084708 | 39.00 | 83.101100 | 173.00 |
| fcvc | 2087 | 2.4214662 | 0.53473659 | 1.00 | 2.396265 | 3.00 |
| ncp | 2087 | 2.7011791 | 0.76461449 | 1.00 | 3.000000 | 4.00 |
| ch2o | 2087 | 2.0047494 | 0.60828437 | 1.00 | 2.000000 | 3.00 |
| faf | 2087 | 1.0128119 | 0.85347467 | 0.00 | 1.000000 | 3.00 |
| tue | 2087 | 0.6630354 | 0.60815328 | 0.00 | 0.630866 | 2.00 |
Based on the dataset description, it’s known that 77% of the data was synthetically generated using the Weka tool and the SMOTE filter, while the remaining 23% was directly collected from users through a web platform.
# Group the organic from synthetic data based on the dataset description.
df_organic <- my_data[1:488, ]
df_synthetic <- my_data[489:nrow(my_data), ]
# Display samples from both subsets to understand their characteristics.
cat("Last 3 rows of Organic Data\n")Last 3 rows of Organic Data
kable(tail(df_organic, 3), caption = "Organic Data - Last 3 Rows")| age | gender | height | weight | calc | favc | fcvc | ncp | scc | smoke | ch2o | family_history_with_overweight | faf | tue | caec | mtrans | n_obeyesdad | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 486 | 18 | Female | 1.67 | 66 | Sometimes | yes | 3 | 3 | no | no | 2 | no | 0 | 0 | Sometimes | Public_Transportation | Normal_Weight |
| 487 | 19 | Male | 1.80 | 60 | no | yes | 3 | 1 | yes | no | 1 | yes | 0 | 0 | Always | Motorbike | Normal_Weight |
| 488 | 20 | Male | 1.56 | 45 | Sometimes | no | 2 | 3 | no | no | 2 | no | 1 | 1 | Sometimes | Public_Transportation | Normal_Weight |
cat("First 3 rows of Synthetic Data\n")First 3 rows of Synthetic Data
kable(head(df_synthetic, 3), caption = "Synthetic Data - First 3 Rows")| age | gender | height | weight | calc | favc | fcvc | ncp | scc | smoke | ch2o | family_history_with_overweight | faf | tue | caec | mtrans | n_obeyesdad | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 489 | 25.19621 | Female | 1.686306 | 104.5727 | Sometimes | yes | 3 | 3 | no | no | 1.152736 | yes | 0.319156 | 1.00000 | Sometimes | Public_Transportation | Obesity_Type_III |
| 490 | 18.50334 | Female | 1.683124 | 126.6738 | Sometimes | yes | 3 | 3 | no | no | 1.115967 | yes | 1.541072 | 1.00000 | Sometimes | Public_Transportation | Obesity_Type_III |
| 491 | 26.00000 | Female | 1.622397 | 110.7926 | Sometimes | yes | 3 | 3 | no | no | 2.704507 | yes | 0.000000 | 0.29499 | Sometimes | Public_Transportation | Obesity_Type_III |
VISUALIZATION WITH DATA MANIPULATION
# Analyze and visualize the most common ages, weights, genders
#and heights in the dataset.
total_of_group <- function(data, group_of, top_n) {
result_table <- data %>%
group_by({{ group_of }}) %>%
summarize(Count = n()) %>%
arrange(desc(Count)) %>%
top_n(top_n, wt = Count)
# Calculate peak values
peak_values <- result_table %>%
mutate(ranking = row_number())
ggplot(result_table, aes(x = reorder({{ group_of }}, -Count), y = Count)) +
geom_bar(stat = "identity", fill = "yellow", color = "black") +
geom_text(aes(label = Count), vjust = -0.5, size = 3) +
geom_line(data = peak_values, aes(group = 1, x = reorder({{ group_of }}, -Count), y = Count), color = "red") +
labs(title = paste("Most Common", deparse(substitute(group_of))), x = deparse(substitute(group_of)), y = "Count")+
theme(axis.text.x =element_text(color = "red"),
axis.text.y= element_text(color = "red"),
plot.title = element_text(face = "bold.italic", size = 14,color = "blue",family ="Times New Roman"),
axis.title.x = element_text(face = "bold.italic"),
axis.title.y = element_text(face = "bold.italic") )
}# Let's visualize the ages and also weights.
total_of_group(my_data, age, 20)total_of_group(my_data, weight, 20)THIS TYPE OF GRAPHIC WILL BE IN USE A LOT IN THE REST OF THE PROJECT, IN ORDER TO HAVE A BETTER UNDERSTANDING OF THE RELATIONSHIPS.
NOTE ! : TO SEE THE COUNTS OF OTHER COLUMNS , JUST CLICK HERE.
total_of_group(my_data, height, 20)total_of_group(my_data,mtrans,6)total_of_group(my_data,gender,2)# Group by obesity level smoking , gender and alcohol consumption.
#Because we need these information to understand the relationship between obesity and other factors.
rs <- my_data %>%
group_by(calc,n_obeyesdad,smoke,gender) %>%
summarize(count = n())#Then , let's visualize it.
ggplot(rs, aes(x = smoke, y = count, fill = n_obeyesdad)) +
geom_bar(stat = "identity", position = "dodge") +
facet_grid(gender~calc ) +
labs(
title = "Relationship Between Alcohol Consumption, Smoking,Gender and Obesity",
subtitle = " ALCOHOL CONSUMPTION ",
x = "SMOKING",
y = "Total Number",
fill = "Obesity Levels"
) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(face = "bold.italic",hjust = 0.5,color = "red"),
axis.text.x = element_text( hjust = 1 , face ="bold.italic",colour ="black"),
axis.text.y = element_text(face ="bold.italic",colour ="black"),
axis.title.x = element_text(face = "bold.italic", color = "red" , family = "Times New Roman" ,size = 14),
axis.title.y = element_text(face = "bold.italic", color = "black" , family = "Times New Roman",size =14),
legend.position = "bottom"
)IN THIS DATASET 2043 PEOPLE DO NOT SMOKE, WHICH MEANS THE NUMBER OF NON-SMOKERS IS QUITE HIGH. IN CONTRAST, ONLY 44 PEOPLE SMOKE. THEREFORE, THE COMPARISON REGARDING SMOKING IS NOT BALANCED SO WELL
total_of_group(my_data,smoke,2)JUST AS ABOVE, THE PEOPLE WHO SAY THEY “ALWAYS” DRINK ARE ONLY “1”. THAT’S WHY THE GRAPH OF “ALWAYS” IS NOT SO BALANCED.
total_of_group(my_data,calc,4)# Create a new feature for activity levels
my_data <- my_data %>%
mutate(
activity_level = case_when(
faf == 0 ~ "No Activity",
faf < 1 ~ "Low Activity",
faf < 3 ~ "Decent Activity",
.default = "High Activity"
)
)
#Grouping again to see the relationship between activity level and transportation preferences.
data <- my_data %>%
group_by(mtrans, activity_level, family_history_with_overweight) %>%
summarise(total = n(), .groups = 'drop')ggplot(data, aes(x = total, y = mtrans, fill = mtrans)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Relation between activity level, preferences of transportation and family history of overweight",
subtitle =" History of overwight from your family ",
x = "Total Count",
y = "Transportation Type"
) +
theme(axis.text.x = element_text(hjust =1,color = "red",face = "bold.italic"),
plot.title = element_text(face = "bold", hjust = 0.5 ,size =10,family = "Times New Roman",color = "blue"),
plot.subtitle = element_text(face = "bold", hjust = 0.5 ,size =8,family = "Times New Roman",color = "red"),
axis.text.y = element_text(angle = 45, hjust = 1, face = "bold.italic"),
legend.position ="bottom"
) +
facet_grid(activity_level ~ family_history_with_overweight)# Density plot for distribution of weight by obesity levels ggplot(my_data, aes(x = weight , fill = n_obeyesdad))+ geom_density(alpha = 0.8) + # Density plot labs(title = "Distribution of Weight by Obesity Levels", x = "Weight", y = "Density") + theme(axis.text.x = element_text(angle = 45, hjust = 1 , face = "bold.italic"), axis.text.y = element_text( face ="bold.italic",colour ="black"))
To see the percentages of obesity levels in this dataset , let’s print another graphic.
# Calculate the percentage of each obesity level
data <- my_data %>%
count(n_obeyesdad) %>%
mutate(percentage = n / sum(n) * 100)# Create the pie chart
ggplot(data, aes(x = "", y = percentage, fill = n_obeyesdad)) +
geom_bar(width = 1, stat = "identity") +
coord_polar(theta = "y", start = pi) + # Start at 180 degrees
labs(fill = "OBESITY LEVELS") +
theme_void() + # Remove background and axis
geom_text(aes(label = sprintf("%.2f%%", percentage)),
position = position_stack(vjust = 0.5), size = 5) +
theme(legend.title = element_text(size = 15),
legend.text = element_text(size = 12))DATA MANIPULATION
To understand what is BMI , please click here
I think everyone knows the formula of BMI.Let me write it again :
\[ BMI = \frac{Weight}{Height^{2}} \]
#Now , it is time to add new features to our dataset.
#First , let's calculate the BMI of every person.
calc_bmi <- \(dataset) {
dataset %>%
mutate(BMI = weight / (height * height) )
}
my_data <- calc_bmi(my_data)
#Let's check if it was added as new column.
my_data %>%
head(3) %>%
datatable()#I want to show relation between family history with overweight relation and obesity level.
relations<-my_data %>%
group_by(n_obeyesdad, family_history_with_overweight,gender) %>%
summarise(count = n())
#As we can see, people who are overweight or have more weight than normal often have a relationship with their families overweight.#LET'S ALSO VISUALIZE THIS :
ggplot(relations, aes(x = gender, y = count, fill = n_obeyesdad)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = round(count, 2)), position = position_dodge(width = 0.9), vjust = -0.5, size = 3) +
labs(
title = "Obesity Levels by Family History of Overweight over Gender",
subtitle = " Do you have a history from your family of Overweight ? ",
x = "Gender",
y = "Total Number",
fill = "Obesity Level"
) +
facet_wrap(~family_history_with_overweight)+
theme(axis.text.x = element_text(angle = 45, hjust = 1,face="bold.italic"))# Group by obesity level and summarize mean BMI , weight ,height and age
obesity_summary <- my_data %>%
group_by(n_obeyesdad) %>%
summarise(
mean_BMI = mean(BMI),
mean_age = mean(age),
mean_wgh= mean(weight),
mean_hgh = mean(height),
total_num = n(),
)
obesity_summary %>%
arrange(desc(mean_BMI)) %>%
datatable(
caption = "Summary Statistics by Obesity Level",
options = list(
pageLength = 5,
autiWidth =TRUE
)
)# Convert mean_age to a factor
obesity_summary <- obesity_summary %>%
mutate(mean_age_factor = factor(round(mean_age)))# Visualization
ggplot(obesity_summary, aes(
x = mean_age_factor,
y = mean_BMI,
fill = n_obeyesdad)) +
transition_reveal(along = mean_age) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.5) +
ylab("Mean of Age") +
ggtitle("Mean BMI by Obesity Level") +
labs(title = "Mean of AGE over BMI with Animation",
x = "Mean of Age",
y = "Mean of BMI") +
scale_x_discrete() +
scale_y_continuous(breaks = seq(0, max(obesity_summary$mean_BMI), by = 5)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(family = "Times New Roman", colour = "blue", face = "bold.italic"))# Rearrange columns to move BMI to the front
my_data <- my_data %>%
relocate(BMI, .after = age)
# Display the rearranged data
my_data %>%
head(2) %>%
datatable(caption = "Data with BMI Column Relocated")# Create a new feature: Age group
my_data <- my_data %>%
mutate(
age_group = case_when(
age < 20 ~ "Teenager",
age < 35 ~ "Young Adult",
age < 50 ~ "Middle Age",
.default = "Senior"
)
)
# Group by age group and summarize mean BMI
age_group_summary <- my_data %>%
group_by(age_group) %>%
summarise(
mean_BMI = mean(BMI),
mean_weight = mean(weight),
count = n()
)
# Display the age group summary
age_group_summary %>%
datatable(caption = "Summary Statistics by Age Group")# Plot mean BMI by age group
ggplot(my_data, aes(x = age_group, y = BMI, fill = age_group)) +
geom_violin(alpha = 1) +
labs(
title = "Graph of BMI by Age Group",
subtitle = " The history of obesity in the family",
x = "Age Group",
y = "BODY MASS INDEX"
) +
facet_grid( gender ~ family_history_with_overweight) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1, face = "bold.italic"),
legend.position = "bottom",
legend.title = element_text(face = "bold.italic", size = 10, color = "purple" ),
plot.title = element_text(face = "bold.italic", size = 14)
)Individuals with a family history of obesity tend to have higher BMI levels. This correlation suggests a significant genetic or environmental influence on obesity within families.
# Distribution of BMI by obesity level
ggplot(my_data, aes(x = BMI, fill = n_obeyesdad)) +
geom_density(alpha = 0.8) +
labs(
title = "Distribution of BMI by Obesity Level",
x = "BMI",
y = "Density"
) +
theme_minimal()# Scatter plot of weight vs. height colored by obesity level
ggplot(my_data, aes(x = height * 100, y = weight, color = n_obeyesdad)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm")+
labs(
title = "Weight vs. Height by Obesity Level",
x = "Height (cm)",
y = "Weight (kg)"
) # Additional multi-line plot for summary statistics
obesity_summary_long <- obesity_summary %>%
pivot_longer(cols = c(mean_BMI, mean_age,mean_wgh,total_num,mean_hgh),
names_to = "variable",
values_to = "value")
# Convert n_obeyesdad to a numeric type
#Because it will be used as transition_reveal , it must be numeric to add animation.
obesity_summary_long <- obesity_summary_long %>%
mutate(n_obeyesdad_numeric = as.numeric(factor(n_obeyesdad, levels = unique(n_obeyesdad))))ggplot(obesity_summary_long, aes(x = n_obeyesdad, y = value, color = variable, group = variable)) +
geom_line(size = 1) + # Draw lines
geom_point(size = 3) + # Add points+
transition_reveal(along = n_obeyesdad_numeric) + # Add transition along the x-axis (n_obeyesdad)
labs(
title = "Summary Statistics by Obesity Level",
x = "Obesity Level",
y = "Value",
color = "Variable"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Mean of height is too close to each other between 169 to 171 , that’s why line looks like constant , but actually it is not.There is too small differences.
my_data <- my_data %>%
mutate(fcvc = case_when(
fcvc <1.6 ~ "Never",
fcvc <2.3 ~ "Sometimes",
.default = "Always"
))
st <-my_data %>%
group_by(n_obeyesdad,fcvc,favc) %>%
summarize(count = n())ggplot(st, aes(x = count, y = n_obeyesdad, fill = n_obeyesdad)) +
geom_bar(stat = "identity", position = "dodge") +
facet_grid(fcvc ~ favc) + #rows for fcvc , columns with facv
labs(
title = "Relations With High Calorie Eater , Vegetable Frequency, and Obesity Levels",
subtitle = " DO YOU EAT HIGH CALORIE FOOD ?",
y= "The frequency of eating vegetable",
x = "Total Number",
fill = "Obesity Level"
) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
axis.text.y= element_text(angle = 30, hjust = 1,size=8),
legend.position = "bottom"
)The strange thing about this graph is that in the high calorie diet group, those who ate more vegetables appeared more obese than those who ate fewer vegetables… To be honest , I didn’t expect that result.It may be cause of the distribution of choices is not similar to each other.
total_of_group(my_data,fcvc,15)# Visualize the distribution of activity levels by obesity level
activity_summary <- my_data %>%
group_by(n_obeyesdad, activity_level) %>%
summarise(count = n(), .groups = 'drop')ggplot(activity_summary, aes(x = activity_level, y = count, fill = n_obeyesdad)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Activity Levels by Obesity")#Before start the analyse , I should make the values of mutate integer , otherwise there are plenty of float numbers.
my_data <- my_data %>%
mutate(tue = case_when(
tue < 0.5 ~ 0,
tue < 1.5 ~ 1,
tue < 2.5 ~ 2,
.default = 3
))
my_data <- my_data %>%
mutate(Tech_Device_Hours = case_when(
tue <1 ~ "0-2 Hours",
tue <2 ~ "3-5 Hours",
.default = "More than 5 hours" ))
statistic <- my_data %>%
group_by(Tech_Device_Hours, activity_level) %>%
summarize(count = n(), .groups = 'drop')ggplot(statistic, aes(x = activity_level, y = count)) +
geom_bar(stat = "identity", fill = "blue") +
transition_states(states = Tech_Device_Hours, transition_length = 2, state_length = 1) +
labs(
title = "Activity Level Distribution by Tech Device Hours per day",
subtitle = 'Tech Device Hours: {closest_state}', # {closest_state} dynamically updates to reflect the current Tech_Device_Hours state in the animation
x = "Activity Level",
y = "Total numbers"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1 , face = "bold.italic" , color = "black"),
plot.subtitle = element_text(face = "bold.italic" , color = "red",family = "Times New Roman")
)Hyperlink
You can find more information on obesity on the World Health Organization’s website.
REFERENCES FROM references.bib
REFERENCES FROM references.bib
World Health Organization. Obesity and overweight. Retrieved from WHO (World Health Organization 2021).
Smith, J. (2019). The Growing Concerns of Obesity in Modern Society. Journal of Health, 10(2), 123-134. (Smith 2019)
Brown, L., & Green, M. (2020). The impact of family history on obesity: A review. Journal of Obesity Research, 22(3), 234-245 (Brown and Green 2020)
Johnson, E. (2018). Nutrition and Obesity. Health Press. (Johnson 2018)
Centers for Disease Control and Prevention. (2020). Adult Obesity Facts. (Centers for Disease Control and Prevention 2020)